53 results found.
Written
Corpus,
Language Type:
Multilingual
Languages:
Bengali Gujarati Hindi Kannada Malayalam Marathi Punjabi Sindhi Sinhala Tamil Telugu Urdu
Availability:
Freely Available
License:
CreativeCommons
Size:
2 GByte Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset
-
Paper track:Written/oral presentation
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Brian Roark | Dakshina dataset | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Multilingual
Languages:
Bengali Hindi Malayalam
Availability:
From Owner
License:
Size:
77 hours Production Status:
Newly created-finished
Use:
Speech Synthesis
-
Paper title:IndicSpeech: Text-to-Speech Corpus for Indian Languages
-
Paper track:Speech/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Rudrabha Mukhopadhyay | IndicSpeech: Text to Speech corpus for Indian Languages | /N |
Documentation:
Yes, public documentation in English
Speech
Corpus,
Language Type:
Multilingual
Languages:
Bengali
Availability:
From Owner
License:
N/A
Size:
21.64 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Developing the Bangla RST Discourse Treebank
-
Paper track:Infrastructural Issues/Large Projects
-
Paper status:Accept Oral
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Debopam Das | University of Potsdam | DE |
| Author 2 | Manfred Stede | University of Potsdam | DE |
| Main Contact | Debopam Das | University of Potsdam | None |
Documentation:
Das, B., Mandal, S., and Mitra, P. (2011). Bengali speech corpus for continuous automatic speech recognition system. In Proceedings of Conf. Speech Database and Assessments (Oriental COCOSDA), pages 51–55.
Written
Treebank,
Language Type:
Monolingual
Languages:
Bengali Chinese English Filipino Hindi Indonesian Japanese Khmer Lao Malay Myanmar Thai Vietnamese
Availability:
Freely Available
License:
CreativeCommons
Size:
20106 sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation
-
Paper track:Long paper/
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Abhisek Chakrabarty | Asian Language Treebank Parallel Corpus | /N |
Documentation:
http://www2.nict.go.jp/astrec-att/member/mutiyama/ALT/ALT-Parallel-Corpus-20191206/README.txt
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Bilingual
Languages:
Arabic Bengali Chinese English Hindi Korean Russian Thai and Urdu
Availability:
From Data Center(s)
License:
LDC
Size:
595 hours Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2006 NIST Speaker Recognition Evaluation Training Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Dari English German Hindi Iranian Persian Japanese Korean Mandarin Chinese Persian Russian Spanish Standard Arabic Tamil Thai Vietnamese Yue Chinese
Availability:
From Owner
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2007 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Bengali Czech Dari English Hindi Lao Mandarin Chinese Mesopotamian Arabic Moroccan Arabic North Levantine Arabic Panjabi Persian Polish Pushto Russian Slovak South Levantine Arabic Spanish Standard Arabic Tamil Thai Turkish Ukrainian Urdu
Availability:
From Owner
License:
LDC
Size:
204 hours Production Status:
Existing-used
Use:
Language Identification
-
Paper title:Metric learning loss functions to reduce domain mismatch in the x-vector space for language recognition
-
Paper track:4.1 Language identification and verification, lang/Oral Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Raphaël Duroselle | 2011 NIST Language Recognition Evaluation Test Set | /N |
Documentation:
None
Speech/Written
Corpus,
Language Type:
Multilingual
Languages:
Amharic Bengali Cantonese Georgian Javanese Lao Vietnamese Zulu
Availability:
From Data Center(s)
License:
LDC
Size:
80 GByteProduction Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:Zero-shot Cross-Lingual Phonetic Recognition with External Language Embedding
-
Paper track:8.11 Cross-lingual and multilingual/accent aspects/Poster Presentation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Heting Gao | IARPA Babel Language Pack | /N |
Documentation:
None




